AITopics | integrating momentum

Collaborating Authors

integrating momentum

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Neural Information Processing SystemsDec-23-2025, 19:06:39 GMT

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.

integrating momentum, momentumrnn, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

Neural Information Processing SystemsMay-26-2025, 20:51:18 GMT

Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability in deep learning. SMoE has the potential to exponentially increase in parameter count while maintaining the efficiency of the model by only activating a small subset of these parameters for a given sample. However, it has been observed that SMoE suffers from unstable training and has difficulty adapting to new distributions, leading to the model's lack of robustness to data contamination. To overcome these limitations, we first establish a connection between the dynamics of the expert representations in SMoEs and gradient descent on a multi-objective optimization problem. Leveraging our framework, we then integrate momentum into SMoE and propose a new family of SMoEs, named MomentumSMoE.

artificial intelligence, machine learning, momentumsmoe, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.79)
Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Review for NeurIPS paper: MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Neural Information Processing SystemsJan-21-2025, 22:56:06 GMT

This was *not* demonstrated -- empirically faster convergence is not equatable to acceleration in an optimization sense. It is better to use precise language to separate what you have shown (an architecture with faster convergence) and what you are hypothesizing (it is related to momentum acceleration). As presented, I am not convinced of the latter connection. I think it's fine to say that your method is *inspired* by momentum, but in my opinion the paper implies a much stronger connection that is not substantiated by the theoretical and empirical results. I currently remain unconvinced that the proposed method's improvements are related to momentum at all. There are plenty of simpler explanations, as also offered by Reviewers 3 and 4, which should at least be discussed and ideally ablated. Ultimately, I see this paper as posing an interesting possible connection, but one that is currently speculative and not ready for publication. Aside from the overall writing, I have a few more detailed suggestions for improving the paper.

integrating momentum, momentumrnn, recurrent neural network, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Review for NeurIPS paper: MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Neural Information Processing SystemsJan-21-2025, 22:55:59 GMT

This paper makes a nice connection between standard ways of regularizing the dynamics of SGD and that of RNN. Although there are some disagreements between reviewers regarding the theoretical justification, the contribution is of interest to NeurIPS audience.

integrating momentum, neurips paper, recurrent neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Neural Information Processing SystemsOct-9-2024, 14:26:11 GMT

integrating momentum, momentumrnn, recurrent neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback